Stage-differentiated ensemble modeling of DNA methylation landscapes uncovers salient biomarkers and prognostic signatures in colorectal cancer progression

Background Aberrant DNA methylation acts epigenetically to skew the gene transcription rate up or down, contributing to cancer etiology. A gap in our understanding concerns the epigenomics of stagewise cancer progression. In this study, we have developed a comprehensive computational framework for the stage-differentiated modelling of DNA methylation landscapes in colorectal cancer (CRC). Methods The methylation β-matrix was derived from the public-domain TCGA data, converted into M-value matrix, annotated with AJCC stages, and analysed for stage-salient genes using an ensemble of approaches involving stage-differentiated modelling of methylation patterns and/or expression patterns. Differentially methylated genes (DMGs) were identified using a contrast against controls (adjusted p-value <0.001 and |log fold-change of M-value| >2), and then filtered using a series of all possible pairwise stage contrasts (p-value <0.05) to obtain stage-salient DMGs. These were then subjected to a consensus analysis, followed by matching with clinical data and performing Kaplan–Meier survival analysis to evaluate the impact of methylation patterns of consensus stage-salient biomarkers on disease prognosis. Results We found significant genome-wide changes in methylation patterns in cancer cases relative to controls agnostic of stage. The stage-differentiated models yielded the following consensus salient genes: one stage-I gene (FBN1), one stage-II gene (FOXG1), one stage-III gene (HCN1) and four stage-IV genes (NELL1, ZNF135, FAM123A, LAMA1). All the biomarkers were significantly hypermethylated in the promoter regions, indicating down-regulation of expression and implying a putative CpG island Methylator Phenotype (CIMP) manifestation. A prognostic signature consisting of FBN1 and FOXG1 survived all the analytical filters, and represents a novel early-stage epigenetic biomarker / target. Conclusions We have designed and executed a workflow for stage-differentiated epigenomic analysis of colorectal cancer progression, and identified several stage-salient diagnostic biomarkers, and an early-stage prognostic biomarker panel. The study has led to the discovery of an alternative CIMP-like signature in colorectal cancer, reinforcing the role of CIMP drivers in tumor pathophysiology.

>>> (2) We have complied with all the suggestions made to us with respect to our submission. As a proactive measure, we have reworked the entire set of figures and replaced it with a new more compact set of figures. This has been done in the following manner: (i) Figures 1, 2 . The manuscript has been accordingly updated. We request the reviewers to link to the highresolution figures from the manuscript pdf. >>> (3) To reflect all tracked changes since the original manuscript submission, the changes have been color-coded in the following manner: blue for revision-1, red for revision-2, and orange for changes post appeal.

Revision R2: Response to Reviewers:
Academic Editor: "The effort toward improving the text led to a significant improvement of the manuscript. Unfortunately, a however, quality of most figures is still very poor, the images extremely blurred and the numbers difficult to visualize cause these ito be n most cases useless to the reader. Unless this problem is not correctly addressed and solved, the manuscript can not be accepted for publication." >>>We would like to thank the Editor and Reviewers for their comments. We have updated all the figures again, to ensure maximum clarity, and also subjected each individual figure to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ to ensure that every individual figure meets PLOS requirements. We can assure you that all figures meet the requirements. If there is anything wanting in any figure, please let us know the figure identification and we will rectify it immediately.

Revision R1: Response to reviewers
>>>At the outset, we would like to thank the reviewers and the Editor for their valuable comments.
Academic Editor: The manuscript has been reviewed by two experts in the filed that both found it quite good and of interest, despite some problems, highlighted in particular by R.2, that must be addressed before it can be considered for publication.
>>>We have addressed the points raised by the reviewer#2 and have substantially revised the manuscript and expanded the scope of our investigations / discussion.
1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming.
>>> Thank you, we have done the same.
2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supportinginformation.
>>> Thank you, this has been done.
Reviewer #1: In this paper Muthamilselvan et al., developed a comprehensive computational framework for stage-differentiated modelling of DNA methylation landscapes in CRC, found significant changes and discovery a novel CIMP-like signature bearing potential clinical significance.
The data are supported by a strong statistical analysis.
The paper can be acceptted for pubblication.
Minor point +Page 3 the word "Tomczak" must be deleted.
>>> Thank you, it was inadvertent and has been deleted. We would like to thank Reviewer #1 for their time and the kind comments.
Reviewer #2: In this study, the Authors propose a computational workflow for the analysis of the DNA methylation aberrations in colorectal cancer (CRC) with a stage-differentiated perspective. Data have been collected from The Cancer Genome Atlas (TCGA) portal; as a result, the Authors identify 7 stage-characteristic genes that could be indicative of a novel CpG island methylator phenotype (CIMP). The manuscript provides an interesting perspective and a detailed explanation of how DNA methylation data could be analyzed to detect epigenetic signatures between normal and tumoral samples or in different stages of the disease. Bioinformatic procedures are well described and depicting a useful workflow when handling big and complex data from repositories such as TCGA. Unfortunately, however, I cannot avoid pointing out that the work has serious shortcomings that do not allow to accept its publication in the present version and must be mandatorily corrected. Importantly, even if interesting, data are presented in a confused manner; in particular, the Authors should better define the aim of the study and the experimental design, carefully organize the Results, improve the Discussion and avoid exaggerated conclusions, concerning in particular inferences drawn from data that should be better supported by experimental validation. Specific comments are listed below.
>>> We would like to thank Reviewer #2 for the careful reading of our paper and the criticalcomments. We have addressed all the many valid points in the present revision. We have undertaken a major revision of the manuscript in line with the suggestions.
1. The Authors should carefully revise the text and correct some grammar mistakes. 6. The correlation analyses between methylation and gene expression data are providing interesting information but should be better described by focusing the attention not only on the methodological procedure used but also on the biological meaning of the observed results. >>> This has now been rectified. Indeed, we now show plots for only the stage-salient genes, to make the biological connections and meaning clear. We note that all the results from our investigations are available in the Supplementary Files. 7. In the Conclusions the Authors write: "All the stage-salient genes were found to be hypermethylated, indicating a novel CIMP-like character possibly promoting epigenetic destabilisation, which in turn would drive the progression of colorectal cancer". First, it is not clear where the hypermethylation associated with these stage-salient genes is located (promoter, TSS, CpG island or gene body); this should be better explained. Then, the role of the stage-salient genes identified by the Authors should be better characterized in the context of CRC to indicate a possible novel CIMP-like phenotype; I would suggest the Authors to enrich the Discussion by adding more details and experimental evidence of the involvement of these genes in CRC pathogenesis. >>> We have now increased the literature weight for these statements and discussion. We have included a new Table 8 with the location of the DM probes. and a new Figure 19 to support these assertions. We further found a new publication citing stage-IV specificity for FAM123A while this manuscript was under review (medrxiv preprint of our work was available in October 2020). This has been included in the References.